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k'e  consider  a  nonlinear  regression  model  for  which  the  variances  depend 
on  a  parametric  function  of  known  variables.  We  focus  on  estimating  the  var¬ 
iance  function,  after  which  it  is  typical  to  estimate  the  mean  function  by 
weighted  least  squares.  Most  often,  squared  residuals  from  an  unweighted 
least  squares  fit  are  compared  to  their  expectations  and  used  to  estimate  the  var¬ 
iance  function.  If  properly  weighted  such  methods  are  asymptotically  equiva¬ 
lent  to  normal -theory  maximum  likelihood.  Instead,  one  could  use  the  devia¬ 
tions  of  the  absolute  residuals  from  their  expectations.  We  construct  such 
an  estimator  of  the  variance  function  based  on  absolute  residuals  whose  asymp¬ 
totic  efficiency  relative  to  maximum  likelihood  is  precisely  the  same  for  sym¬ 
metric  errors  as  the  asymptotic  efficiency  in  the  one-sample  problem  of  the 
mean  absolute  deviation  relative  to  the  sample  variance.  The  estimators  are 
computable  using  nonlinear  least  squares  software.  The  results  hold  with  min¬ 
imal  distributional  assumptions. 
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1 .  Introduction 

Consider  a  possibly 
of  the  responses  are  not 
of  predictor  variables. 

V ^ ,  Y^,...,  following 

F.  Y.  =  f(x.,-j 

(1.1) 


nonlinear  regression  model  in  which  the  variance 
constant,  but  arc  parametric  functions 
More  specifically,  we  have  independent  observations 
the  mean-variance  model 


▼«Wir 


Variance  (Y.)  =  h(z.,9)  . 

In  (1.1),  (x  z.  i  are  fixed  constants,  8  is  the  vector  regression  parameter, 

is  a  vector  of  q  components,  f  is  called  the  regression  function  and  h  is 

called  the  variance  function.  See  Judge,  et  al.  (1985,  Chapter  11)  for  a 

recent  theoretical  discussion  of  this  basic  heterosccdast ic  regression  model, 

and  Montgomery  (i  Peck  (1982,  pages  99-104)  for  a  simple  example.  If  the 

structural  parameter  t;  were  known,  redefining  Y^  =  Y./h  /_(;  ,0)  and 

1/2 

ftlz,x.,  )  =  f(x..?)/h  *(z^,9)  would  yield  a  homoscedastic  regression 

model  which  can  be  tit  by  one’s  favorite  method. 


IVe  are  interested  in  the  case  that  the  structural  parameter  '•  is 
unknown.  Given  an  estimate  e  of  •’*  ,  the  usual  device  for  estimating  the 
regression  parameter  is  simply  to  pretend  that  '■  is  known  and  equal  to 
■'  ,  and  then  proceed  as  in  the  previous  paragraph.  Ihe  resulting  estimate  of 
.  will  be  called  generalized  least  squares.  It  is  one  of  the  great  folklore 

rems  •>!'  suit  isr  n  >.  which  .issui'-s  u>  that  for  e-  ?  iiuat  ;  ng  ,  i*  n-n  !  ■  > 


does  not  matter  how  we  estimate 


at  least  asymptotically.  More  precisely. 


for  large  sample  sizes  the  limiting  distribution  of  generalized  least 
squares  is  the  same  as  if  6  were  known. 

Respite  the  folklore  asymptotics,  as  intuition  would  indicate  for  finite  sam¬ 
ples  how  one  estimates  t;  really  matters.  Williams  (1975)  states  that  "both 
analytic  and  empirical  studies  of  a  variety  of  linear  models  indicate  that 
...  the  ordering  by  efficiency  of  (estimator  of  .>)  ...  in  small  samples  is 
in  accordance  with  the  ordering  by  efficiency  (of  estimates  of  6)".  In 
the  linear  model,  Toyooka  (1982),  Rothenberg  (1984)  and  Kariya  (1985)  all 
essentially  show  that  for  normally  distributed  data,  the  second  order 
covariance  matrix  of  generalized  least  squares  is  a  monotonical ly  increasing 
function  of  the  covariance  of  the  estimate  of  d;  see  also  Freedman  6  Peters 
(1984)  and  Carroll  5  Ruppert  (1985)  for  similar  results,  finally,  especially 
the  Monte-Carlo  study  of  Goldfeld  fi  Quandt  (1972,  pages  96-120)  shows  that 
it  is  possible  to  construct  a  disastrously  inefficient  generalized  least 
squares  estimator  as  well  as  quite  an  efficient  one. 

The  purpose  of  this  paper  is  to  compare  various  estimators  of  0  by  asymptotic 
efficiency.  Without  making  any  further  assumptions  than  the  minimal  (1.1),  it  is 
possible  to  construct  consistent  and  asymptot ica 1 1 y  normal  estimates  of  0  with  the 
following  "algorithm": 

(J  lisfimate  .  obtain  r  ; 

i  1.2,i  (.2;  form  squared  residuals  i\'  -  |^  -f(.\.,.-j|“  ; 

(5)  Istimate  v  by  a  function  of  the  squared  residuals. 

Three  common  methods  have  been  proposed,  sec  Hildreth  ft  Houck  (1976), 

Amomiya  (1977),  Dent  Hildreth  (1977),  Jobson  5  Fuller  (1980),  Goldfeld  8 
Quandt  (1977),  Harvey  (!9”n)  and  Thei!  (19711,  among  others.  The  first 


method  is  based  on  pretending  that  the  data  are  normally  distributed,  in  which 


case  we  can  compute  the  maximum  likelihood  estimator  6^ 

the  maximum  likelih-iod  estimator,  then  n  solves 

Mi. 


If  8  in  (1.2)  is 


(1.3)  0  =  N~1/2  i 


r . 
i 


i- 1  I  t  h(zi,8)  j  (  L  vi(9) 


where 

vi  =  vit0!  =  ~  lo*  • 

Actually,  the  asymptotic  distribution  of  solutions  to  (1.3)  remains  the 
same  if  the  estimator  of  8  satisfies 

(1.4)  N1/2(B-S)  =  0(1)  , 

a  fact  sketched  in  the  appendix.  The  logic  of  (1.3)  is  that  Hr2  =  ChU^O).  The 
other  methods  are  also  based  on  the  idea  that  ^hfz..^)  is  approximately  the  expec¬ 
tation  of  squared  residuals,  see  Jobson  &  Fuller  (198C).  For  simplicity  of  presen 
tation,  we  will  ignore  the  asymptotically  negligible  bias  in  the  squared  residuals 

/s 

due  to  leverage.  The  unweighted  estimate  minimizes  on  (.'.,?) 


1.5 


i=l 


r  .  *  h<  . 

l  i 


while  the  weighted  estimate  minimizes  in 


N 


(1.6) 


2  . 


i  f-ri  -  :  hlz.,3);  /h  (  z  j »  ' 

i=l 


This  last  estimator  is  motivated  by  the  idea  that  the  variance  of  squared 
residuals  is  approximately  proportional  to  h“(z.,  ■),  so  that  some  sort  ot 
weighting  ought  to  be  employed.  Also  note  that  differentiating  (l.t>)  yields 


4 


0-  l 


N  r  \ 

c  i 


i=l  Ch(zi>0) 


1  h"(2.,0) 

e"vi(0)  h2(2.,eLS) 


which  differs  from  the  likelihood  equation  (1.3)  only  by  the  asymptotically 

~  *  2 

negligible  factors  [h(z . ,0)/h(z . , 0  Q) ]  .  In  Theorem  1  of  the  next  section,  we 

1  X  Li> 

investigate  these  three  estimators  of  0,  proving  that  (1)  for  all  three  the 
estimate  of  i3  is  immaterial  asymptotically  as  long  as  (1.4)  holds;  (2)  maxi- 
mum  likelihood  9^  and  weighted  squared  residuals  6^5  have  the  same  limit 

A  A 

distributions;  and  (3)  both  6^  and  O^g  are  asymptotically  more  efficient 

A 

than  using  unweighted  squared  residuals  via  G^g.  These  results  are  obtained 
essentially  independently  of  the  underlying  distribution. 

Squared  residuals  are  skewed  and  lung-tailed.  For  this  reason, 

Cohen,  et  al.  (1984)  suggest  the  use  of  absolute  residuals,  although  they 
use  absolute  residuals  only  as  one  part  of  their  algorithm  and  eventually 

use  squared  residuals.  For  the  special  case  that  Variance  (Y^)  =  g(Z^,0)  = 

T ■.  2 

(2^  0)  ,  apparently  for  computational  reasons  Glejser  (1969)  and  Theil 

(1971)  also  propose  use  of  absolute  residuals.  Such  use  requires  a  further 
assumption,  namely  that 


11. "I  H , Y .  -  f(x. ,r) 
1  1 


h1/2(Z.,9)  . 


hffoet ively,  (1.1)  and  (1.7)  require  that 


Vflxi^ 


1  r  .  .  ,  ..,,1/2 

{’  h(Zj ,0) ; 

be  independent  and  identically  distributed,  an  assumption  we  shall  make 
from  now  on. 

Mimicking  (1.5)  and  l 1.6),  one  can  construct  two  estimators  of  G  based 

on  absolute  residuals.  Noting  from  (1.7)-(1.8)  that  absolute  residuals 

have  approximate  expectation  p  h  ‘(z^.G)  and  variance  proportional  to 

h(;;.,o),  the  unweighted  absolute  residual  estimator  9,.,  minimizes  in  (”,0) 
1  AV 


1.9)  V  (  r . 

iii  1 


h1/-, 


*. .0);‘ 


while  the  weighted  version  9. 


minimizes 


d.io)  i  (!r.!-lh1'2(z.,e)!2/hu.Iy 
i=l 


For  the  special  case  that  the  standard  deviation  is  linear  in 
exogenous  variables.  Judge,  et  al.  (1985)  propose  our  general  absolute 
residual  estimators.  Even  in  this  special  case,  they  state  that  the 
properties  of  §AV  and  §^Ay  "have  not  been  fully  investigated".  In  their 
specific  context,  they  go  on  to  make  in  effect  three  conjectures: 

(a)  Absolute  residual  estimators  of  9  are  not  affected  by  the  method 
of  estimating  3,  as  long  as  (1.4)  holds; 

(b)  Weighted  absolute  residuals  9WAy  are  more  efficient  than  not 
weighting  and  using  6A  V' 

(c)  If  we  define 
6  =  Var ( i e  ) 

«■  3  Var(c2)  , 

then  in  the  light  of  Theorem  1  the  asymptotic  relative  efficiency 
of  the  weighted  absolute  residual  estimator  SWAV  with  respect  to 
maximum  likelihood  or  weighted  squared  residuals  0..j  _  is 


(1.11) 


In  this  paper,  we  verify  all  these  conjectures  when  the  errors  (1.8) 
are  symmetrically  distributed.  In  Section  3,  we  discuss  why  it  is  that,  for 
J his  special  case,  using  absolute  residuals  may  be  preferred  when  viewed 
from  a  perspective  of  efficiency  robustness. 


We  also  show  that  conjecture  (.a)  and  hence  conjecture  (c)  are 
false  in  general  for  asymmetrically  distributed  errors.  While  the 
dependence  of  the  asymptotic  distribution  on  the  estimate  of  6  certainly 
complicates  the  theory,  the  dependence  does  not  disqualify  using  absolute 
residuals.  We  exhibit  a  simple  example  for  which  using  absolute  residuals 
is  always  more  than  twice  as  efficient  as  using  squared  residuals. 

The  theorems  are  stated  in  the  next  section,  with  proofs  in  the 
appendix.  In  the  third  section  we  discuss  the  statistical  implications 


of  the  results. 
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2.  Major  Results 

We  state  the  results  somewhat  informally  and  in  the  appendix  only  sketch 
the  proofs,  relying  for  the  most  part  on  simple  Taylor  series  expansions  and 
the  somewhat  more  complex  linearizations  in  Ruppert  £  Carroll  (1980)  and  Carroll 
5  Ruppert  (1982).  We  first  consider  the  estimators  i^,  0^.  and  v 
based  on  squared  residuals.  Under  minimal  assumption  such  as  (1.1)  and 
(1.4),  these  estimators  are  consistent  and  asymptotically  normally 
distributed  with  asymptotic  covariance  unaffected  by  the  choice  of  6  . 

The  covariance  simplifies  if  the  errors  (1.8)  are  independent  and  identically 
distributed,  an  assumption  we  will  make  throughout.  Define 

-1  N-  -  -T  1 

*  'iN  i  (v^vHVj-v)  . 

i=l 

Theorem  *  i .  Under  (1.4)  and  further  legulaiity  cor.di  tioiis,  maximum  likeli 
hood  6ML  and  weiShted  squared  residuals  •  WJ g  have  the  same  asymptotic 
distributions,  with 

Nl/h<W9)  L 
s‘/:<Wei  L  N‘0-W  • 


where  "  — •  "  means  convergence  in  distribution, 
residuals  satisfies 


N1  -ULS-,l  L  MO.Z  I 


Further, unweighted  squared 


t'j 


j-' 
'■r  " 


.s'; 


v 

*J 


whcre  ;'ls  ml 


[... 


, 


vv 

.“•V 


Thus,  Theorem  ” 1  assures  us  that  we  should  weight  the  squared  residuals 
tor  greatest  asymptotic  efficiency,  at  least  when  the  errors  are  independent 
and  identically  distributed.  The  next  result  relates  to  the  case  of 
symmetric  errors  11.81,  thereby  proving  the  three  conjectures  of  Judge, 
et  ai.  ( 1 9 s 5 )  in  this  special  case. 


:  hfcoreii:_#2 .  Suppose  that  the  errors  (l.S)  are  symmetrically  distributed 
with  a  distribution  function  which  is  continuous  in  a  neighborhood  of  zero. 
Jnder  (1.4;  and  further  regularity  conditions, 

and 


,1/-', 


N  '  ’  (\av  Ni0>zkavj  • 


Further,  /  •  l  ,  and 


l 


WAV 


1- •) 


^ML  • 


If  the  errors  are  not  symmetrically  distributed,  absolute  residual 
estimation  of  0  is  affected  by  how  one  estimates  6,  see  Carroll  6  Schneider 
(li>S5)  for  a  similar  example  of  this  phenomenon.  Define 


■-  =  Prl-.  0)-Pr(e<0) 


Recall  that,  by  assumption,  0  is  a  vector  of  q  components.  Define  the 
mat  rices 


N  (  1  'j 

C  =  N"1  V  j  ,U.  .-v  I/2jhP'1I:  9 

P  i  =  l  i  ~  v./2  j 


f(x.,B)  =  Tq  f(x.,8) 

1  <3  0  1 


e  =  (0(q  x 1)  I  )  , 


where  I  is  the  qxq  identity  matrix.  Finally,  define 
4 

i 

JL  = 

1  n  v-/2  _ 


Theorem  #3.  Under  (1.4)  and  further  regularity  conditions,  we  have  the 
asymptotic  expansions 

N1/2(0AV-e) 


C1/2n'1/2 


=  e  C. 


I  h(z  .  ,6)  ( | e.  j  -  E  |  £ | )  *,. 
i=l  1  1  1 


!  -y  N'1  l  i  h1/2(z  ,0)fT(x  B)N1/2(3-3) 

*■  i=l  1 


=  e  C, 


C1/2N'1/2  l  S.  ( |e  |  -  Ejej) 

i=l  1  1 


N'1  l  i  h'1/2(z  ,9)fT(x  8)N1/2(S-3)  ! 

i-1  1  > 


Theorem  #2  is  a  corollary  of  Theorem  #3  because,  under  symmetry,  y  =  0 
and  the  effect  of  6  disappears.  In  general,  when  y^O  and  GWAy  are 

still  asymptotically  normally  distributed,  but  their  covariance  matrices 
will  depend  on  the  method  used  to  estimate  8  . 


3 .  D i scussion 

In  Section  2,  we  have  shown  that  for  estimating 
the  structural  parameter  0  in  the  variance  function,  normal  theory  maximum 
likelihood  is  asymptotically  equivalent  to  weighting  squared  residuals  and 
applying  a  nonlinear  least  squares  algorithm.  This  result  holds  essentially 
independently  of  the  underlying  distributions  of  the  errors  {e^}  in 
(1.8),  which  need  not  even  be  identically  distributed.  If  the  errors  are 
identically  distributed,  then  both  methods  are  asymptotically  more  efficient 
than  ordinary  least  squares.  In  practice,  this  means  that  if 

computing  maximum  likelihood  is  inconvenient  as  in  Froehlich  (1973)  or  Dent 
Hildreth  (1977),  then  in  fitting  squared  residuals  one  ought  to 
weight . 


lie  have  also  shown  that,  if  the  errors  (1.8) 
are  independent  and  identically  distributed  symmetric  random  variables, 

then  by  appropriate  weighting  one  can  construct  an  estimator  §WAV  which 

A 

has  asymptotic  efficiency  (1.11)  relative  to  maximum  likelihood  9UI  and 

ML 

/A 

weighted  squared  residuals  •  For  symmetric  distributions,  in  one 

sample  problems  (1.11)  is  the  asymptotic  relative  efficiency  of  the  mean 
absolute  deviation  with  respect  to  the  sample  variance,  see  Huber  (1981, 
pages  2-3).  For  normally  distributed  data,  using  absolute  residuals  is 
1 2"o  less  efficient  than  using  squared  residuals.  However,  for  the  longer- 
tailed  double  exponential  distribution,  using  absolute  residuals  is  25% 
more  efficient.  Huber  (1981,  page  3)  presents  an  interesting  computation 
of  (1.5)  for  the  class  of  contaminated  normal  distributions 


1 


(  1  -  \)  f  (e)  v  *  '.(e/3)  , 


11 


where  4  is  the  normal  distribution  function.  This  distribution  arises 
when  a  random  fraction  a  of  clean  normally  distributed  data  is  contaminated 
by  normal  data  with  three  times  larger  standard  deviation,  and  it  is 
commonly  used  in  robustness  studies.  The  relative  efficiency  of  absolute 
values  as  a  function  of  the  contamination  fraction  a  is  given  as  follows: 

a  Relative  efficiency 

0.0  87.6% 

0.001  94.8% 

0.002  101.6% 

0.01  143.9% 

0.05  203.5% 

Huber  calls  these  numbers  "disquieting",  noting  that  just  2  "bad"  observations 
in  1000  suffice  to  offset  the  superiority  of  squared  over  absolute  residuals 
when  estimating  the  variance  function. 

If  the  errors  are  symmetrically  distributed  or  nearly  so,  then  robustness 
of  efficiency  considerations  strongly  suggest  using  weighted  absolute  residuals 
to  estimate  the  variance  function  rather  than  weighted  squared  residuals 
or  normal  theory  maximum  likelihood.  Computation  is  not  intrinsically 
difficult  since  it  is  based  on  the  usual  nonlinear  least  squares  methodology. 

The  residuals  are  defined  through  an  estimate  of  the  regression  parameter 
ft  .  The  estimation  of  the  variance  function  using  squared  residuals  is 
asymptotically  unaffected  by  the  estimate  of  8.  The  same  can  be  said  for 
absolute  residuals  only  when  the  errors  are  symmetrically  distributed.  Clearly, 
the  use  of  absolute  residuals  is  complicated  and  more  research  is  needed  in 
this  direction.  That  further  research  may  be  quite  useful  is  seen  in  the  following 


12 


example.  Let  {HL}  be  independent  and  identically  distributed  negative 
exponential  random  variables  with  mean  one.  Consider  the  model  of  hetero- 
scedastic  regression  through  the  origin, 

Y.  =  x.  S  +  {£  x.9}1/2(W.-l)  . 

11  1  1 


In  this  case,  v.  =  9,  v^  =  logx^  and  writing 


_i  ,  _  2 

Var(v)  =  N  £  (v  .-v)  , 

i=l  1 


normal  theory  maximum  likelihood  satisfies 


N1/2(0  -0)  i  N  (o,9. 0/Var(v }) . 


When  0=2,  simple  calculations  show  that  the  estimate  of  ?  does  not  matter 


and  that 


Writing 


N1/2(Lv-6)  N  (0, 3. 4/Var(v)) 


in  =  N  \  x. ‘ 
0  i-i  1 


t>0  -  N"  l  <Vv)xi 

i=  1 


1-6/  2 


we  find  that  if  /  is  any  generalized  least  squares  estimate,  then  from  the 


appendix, 


2  1 


1/2  ~  34  2.37  b 

N1/2C°WAv-e)  =—  N  0,  — - + - ~~2 - 

Var(v)  (Var(v))  a, 
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indicating  the  asymptotic  superiority  of  using  absolute  residuals  as  long 
as 

(3.1)  bQ2  <  2.36  aQ  Var(v) 

2 

By  the  Cauchy-Schwarz  inequality,  b^  £  a^  Var(v)  and  from  (3.1)  we  see  that 
using  absolute  residuals  will  always  be  more  than  two  times  as  efficient  as 
the  MLE  or  squared  residuals. 

The  point  of  the  previous  example  is  that  absolute  residuals  estimation 
of  9  should  not  be  automatically  dismissed  simply  because  it  has  an  in¬ 
convenient  asymptotic  theory  under  asymmetric  errors.  As  long  as  one  can 
reasonably  make  the  crucial  assumption  (1.7),  using  weighted  absolute 
residuals  to  estimate  the  variance  function  should  be  given  serious 
consideration.  However,  further  research  is  needed  to  help  the  statistician 
choose  between  using  weighted  squared  or  absolute  residuals  when  asymmetry 
is  present. 

We  have  confined  our  discussion  to  weighted  least  squares  estimation 
of  B  and  absolute  versus  squared  residuals  for  estimating  the  variance 
function.  Our  techniques  apply  to  other  methods,  including  using  weighted 
logarithms  of  squared  residuals  and  the  robust  estimation  schemes  of  Carroll 
6  Ruppert  (1982)  and  Giltinan,  Carroll  §  Ruppert  (1986). 
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Proofs  of  the  main  results 


To  keep  this  section  somewhat  self-contained,  there  is  some  redundancy 
with  the  text.  Let  »  be  any  estimator  satisfying 

(A.  1 )  N1/2(6-  a)  =0(1). 

P 

Let  {a.}  be  any  sequence  of  constants.  Define 

ri  =  Yi~  f(V  ^ 

si  =  3)  =  {  9)}1/2ei 

y  =  Pr (e  > 0)  -  Pr(e  <  0) 

Ci  =  f ^xi’  ^  =  ^xi* 

b .  =  a . {  i  h(z  . ,  0)  } 1 
li  l 


di  55  Ci^  h^zi* 


r  i/2 


Lemma  // 1 .  Under  regularity  conditions, 


(A. 2)  N_1/2  l  a  r  2  =  N_1/"  \  as  2  +  o  (I) 

i=l  i-1  H 


Proof : 


This  follows  because 


v-l/2  r  2  2s 

N  L  a  At  -s  ) 

i=  1 


-1/2  V  ^  c,  -m2 

=  N  /  a .  i  f  (x  f  (x .,  } 

.  ,  l  i  i 

i=l 


■2  N_I/2  )'  mM"'  h(z.,  ••)]l/2  [f(x. f(x.,  d)]  ) 


B 


by  Taylor  series,  (A.l)  and  the  fact  that  E e  = 0  .  From  (A. 2),  we  see 
that  in  computing  estimates  of  0  based  on  squared  residuals,  it  is 
sufficient  to  do  the  asymptotic  distribution  theory  assuming  6  is  known. 

Proposition  #1.  If  the  distribution  function  {c.}  is  continuous  at  zero 
its  mean,  then 

lim  Nl/2  Et  |c-v/N1/2l  -  !l|;  =  -v  v  . 


Proof:  Routine. 


Lemma  *2 .  Make  the  assumption  of  Proposition  if  1.  Define 


„  m-1/2  *  i  I 

H..  =  N  /  a  .  I  s  . 

N  . L .  i  '  l ' 

1=1 


-!  r 


y  a .  f  (x  . ,  N1/2(fS-(?.)  . 

i=l  1  1 


(A. 3) 


%  -  N"l/2  ),  arri'  *  Vu 

i=l 


Proof:  Define 


.4)  Q„(  \)  =  N  1/2  l  a.  (  !s.-f(x.,  S  +  A/N172)  +  f  (x . ,  S) 

N  . L ,  l  )  1  i  i 


As  in  Ruppert  &  Carroll  (1980)  or  Carroll  &  Ruppert  (1982),  for  every 


M  0  »/e  have 


Writ! ng 


sup  Q  (  •)  -  E  Q„(.‘-)j  =  o  (1)  . 

;  ,  .  ■  p 


X1/2(f  (x.  ,  .-  +.VN’172  )  -  f(x.,  6)  ) 


.  where  c .  =  f  (x , ,  . )  , 


Hi 


we  see  from  Proposition  #1  that 

,  N  T 

(A. 5)  E  Qn(A)  N  l  a  f  (x  ,  3)  1  A  . 

i=l 


1/2  '• 

Substituting  N  (B-8) 
l  N 

(A. 6)  N~ 1 '  ^  I  a.  r 
,,  i 


N 

)  a 
i=  1 


for  A  and  putting  (A. 5)  into  (A. 4)  shows  that 


2  •  T  1/2  •' 

l  a.  f(x.,  b)  Nl/Z(8-P) 

i*l  1  1 


i  Si! 


o  (1)  , 

P 


completing  the  proof.  Lemma  C/2  will  assure  us  that,  when  dealing  with 
methods  based  on  absolute  residuals,  we  may  replace 


„-l/2  V  I  I 

N  )  a .  r . 

. l 1  l  ’ 
i=l 


by 


N"1/2  l  a  Is  j  -  Y  N_1  l  3  f(x  ,  6)T  N1/2<8-6) 
i=l  i=l 


This  makes  the  proofs  routine,  and  eliminates  the  effect  of  the  estimate 
of  8  when  y  =  0  ,  see  Theorem  #2.  Define 


=  v^(?)  =  h(Zi>  '  )  /h  (Z j ,  -  ) 

C  =  N_l  ^  |  .  I  Cl  £  v.T)  h2(p"1)(Z.,  9) 

P  i=l  [  '  Vi  I 


X  f  1  1 

_  1  t'  I  1  ! 

C  ^  =  N  1  > 


.  - ,  j  r  v .  /  2  j 
i=l  C  i  j 


:  (1  r  v  T/2)  hp_l  (Z  ,  0) 


Proof  of  Theorem  if  I.  We  will  study  each  estimator  in  turn,  only  sketching 
the  proof.  For  typing  convenience,  we  will  use  the  generic  (£,  0)  ,  whirl 
will  refer  in  turn  to  the  estimator  under  consideration.  Because  of 


Lemma  #1,  we  may  assume  that  ?  is  known. 


Maximum  Likelihood.  Using  a  Taylor  series  and  Lemma  #1,  we  have  that 


o  =  N 


-1/2 


N 


I 

i=l 


-  1 


£  h(Z.,  0) 


£  v.(6) 


=  N‘1/2 


I 

i=l 


r . 


-  1 


£  h(Zit  0) 


^vi 


i  " 

-  N-1  l 


i=l 


L*  1 

(1 /£  v.T)  N1/2 

'  £  -  £  ' 

1  S  v.  , 

l 

<2» 

1 

CD 

=  N"1/2 


l  'I) 

i=l 


1 

£  v 


, M  s' 


1/2 


i  ! 


1  6  ~ 


£  1 
0 


J 


Thus, 


,1/2 


f  £  -  £ 

1  . 

I  0  -  0  J 

Easy  algebra  yields  the  result. 


N(0,  £2  K  Cj-1) 


Unweighted  squared  Residuals.  Again,  from  Lemma  #1, 


N 


o  =  N_1/2  l  h (Z . ,  0)  { r . 2  -  £  h(Z  ,  §)} 
i=l  1  1  1 


£  v.(9) 


i  m-1/2 


N 


■  N  J  £  li  (Z  ,  9)  (e  -1) 

i=l  1  1 


-1  ^  2 

-  N  ti  (Z.,  0) 

i=l  1 


1  1 


l  £  vi  J 


If  r  1 

r  T,  „l/2  ’  * 

(1  £  v.  )  N 


e  g  J 


so  that 

1/2 


N 


t,  -  c, 

e  -  9 


N(0,  £2  K  C2_1C3C2_l  ) 


Since  1  1  c2  1  C3  C2  1  ’  unwei8hted  squared  residuals  are  less  efficient 


than  maximum  likelihood. 


o 


1  ] 
.  €  v.(§)  J 


ed  Squared  Residuals.  Again,  from  Lemma  if  1, 


_  n-1/2  I  h<V  '  ( 

i-1  [  h(Zl,  6ls)  h(Zl,  0LS) 

N  f  1  ] 

=  N  1/2  j  £(e.2  -  1)  I 

i=l  i  C  v  j 


e  -  e 


This  shows,  as  claimed,  that 


l  -  c  | 
0  -  e  J 


N(0,  r2  K  Cj-1)  . 


Proof  of  Theorem  #3. 


By  Lemma  #2,  we  will  be  able  to  replace  |r^|  by  js^|  -  Y  f(x^,  B)^(3  -  6) 
Recall  that  we  are  writing  E !  G  |  =  n/C  »  and  that  l  E  |  g  |  ]  =1-5,  £  =  Var(|ej) 

For  unweighted  absolute  residuals,  from  Lemma  #2  and  a  Taylor  series  we 


obtain 

o  =  N“l/2  l  h1/2(Z  ,  0)  f|r  j  -  n  h1/2(Z  ,  0))  j  , 

i-l  11  1  nv.(9)/2 


-  C1/2  N-1/2 


l  U  h(Z  ,  0)(|e  |  -  E | G | ) 
i-l 


-1  r*  1/9  *T  1/2  N 

y  N  1  2.  *  •  hWZ(Z.,  9)  f  (x.,  B)  NWZ(3-  6) 

i=l  1  1  1 


r  v1 /2  , 

-  c2*  N  ! 


f  \-n  1 


.  0-9 


This  is  the  first  part  of  Theorem  #3.  Noting  that  for  weighted  absolute 
residuals 


o  =  N 


N 

L 

i= 


essentially  the  same  application  of  Lemma  02  and  Taylor  series  completes 
the  proof. 


Proof  of  Theorem  02.  For  the  symmetric  case,  y  =  0.  From  Theorem  03, 
N1/2(§av-0)  — *  N<°*  S  <5  C2*_1  C3*C2;1) 

n1/2(0wav-0)  —  *K0.  ■  6  C^1)  . 

2 

Noting  that  i  =  n  / ( 1— 6 )  and  simple  algebra  completes  the  proof. 

Proof  of  3.2.  Detailed  calculations  yield 

%  =  Pr (c  >  0)  -  Pr(e  <  0)  =  2  e_1  -  1 

£  =  Var  ( |cj)  =  1-4  e  2 

E I  c  |  =  2  e  1 
E ( e I  £  I )  =  4  e_1  -  1 


H  Var(v) 


(v.-v)  . 


For  any  generalized  least  squares  estimate  of  6  , 

N1/2(B-i?.)/F,1/2  -  (l/a0)  N'1/2  l  xiI-9/2  C.. 


Substituting  into  Theorem  03  yields 


1  /2  ~ 

n  '  ( e  -  e ) 

''WAV  ' 


E  |  c  Var(v) 


.,-1/2 


N  f  (V.  -  V )  (  I  c .  I  -  F.  i  C  ! )  1 

l  i 

i=l  I  .  1-9/2 

[  -Y(b0/a0)  Xi  Li 


llie  result  now  follows  immediately. 
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Abstract 


Our  focus  is  the  simple  linear  regression  model  with  measurement 
errors  in  both  variables.  It  is  often  stated  that  if  the  measurenent  error 
in  x  is  "small" ,  then  we  can  ignore  this  error  and  fit  the  model  to  data 
using  ordinary  least  squares.  There  is  sane  ambiguity  in  the  statistical 
literature  concerning  the  exact  meaning  of  a  "snail"  error.  For  exanple. 
Draper  and  Smith  (1981)  state  that  if  the  measurement  error  variance  in  x 
is  snail  relative  to  the  variability  of  the  true  x's,  then  "errors  in  the 
x's  can  be  effectively  ignored",  see  Montgomery  &  Peck  (1983)  for  a  similar 
statement.  Scheffe  (1973)  and  Mandel  (1984)  argue  for  a  second  criterion, 
which  may  be  informally  summarized  that  the  error  in  x  should  be  small 
relative  to  (the  standard  deviation  of  the  observed  Y  about  the 
line)/(slope  of  the  line).  We  argue  that  for  calibration  experiments  both 
criteria  are  useful  and  important,  the  former  for  estimation  of  x  given  Y 
and  the  latter  for  the  lengths  of  confidence  intervals  for  x  given  Y. 


1.  Introduction 


There  is  substantial  literature  on  the  problem  of  precision  instrument 
calibration,  see  for  example  Scheffe  (1973),  Rosenblatt  and  Spiegelman 
(1981)  and  Mandel  (1984).  We  will  focus  on  such  calibration  when  fitting  a 
straight  line  to  a  set  of  data  in  which  the  predictor  x  is  measured  with 
error. 

Recently  we  were  asked  to  try  to  quantify  what  is  meant  by  a  "small" 
measurement  error  in  x,  with  the  idea  that,  if  such  error  were  small,  we 
could  safely  ignore  it  and  precede  with  ordinary  least  squares  analysis. 

In  trying  to  do  this  we  realized  that  the  literature  is  somewhat  ambiguous, 
and  in  fact  there  are  two  distinct  criteria  used  to  decide  when  measurement 
error  in  x  is  small.  For  example,  Draper  and  Smith  (1981,  page  124)  state 
that  if  the  measurement  error  variance  in  x  is  small  relative  to  the 
variability  of  the  true  x's  themselves,  then  "errors  in  the  x's  can  be 
effectively  ignored  and  the  usual  least  squares  analysis  performed".  This 
content  is  echoed  by  Montgomery  and  Peck  (1982,  page  388).  On  the  other 
hand,  both  Scheffe  (1973,  page  2)  and  Mandel  (1984)  use  the  criterion  that 
we  can  safely  ignore  measurement  error  in  x  if  its  standard  deviation  is 
small  relative  to  the  ratio 

Standard  deviation  of  measured  Y  about  the  line. 

Slope  of  the  line 

The  authors  were  working  in  different  contexts,  so  it  is  not  surprising 
that  their  criteria  differ. 


In  this  paper,  we  point  out  that  for  calibration  experiments  both 

criteria  are  useful.  The  criterion  used  by  Draper  and  Smith  is  appropriate 

when  the  goal  is  estiitation  of  intercept  and  slope  based  on  the  calibration 

data  set,  and  then  at  the  second  stage  for  estimating  the  true  value  of  x 

from  a  new  observed  Y.  The  criterion  of  Scheffe  and  Mandel  addresses  the 

issue  of  lengths  of  confidence  intervals  for  estimating  x  from  an  observed 

Y.  If  the  Draper  and  Smith  criterion  is  satisfied  while  that  of  Scheffe 

and  Mandel  is  not,  the  effect  of  ignoring  the  measurement  error  in  x  is 

essentially  to  cause  larger  confidence  intervals  for  estimating  the  true 

value  of  x  fran  new  observed  Y  than  is  necessary. 

Suppose  that  observed  responses  {Y^}  are  related  linearly  to  the  true 

working  standards  {x^}  through  the  equation 

Y i  =  at  +  flXj^  +  €|  ,  i  =  1,2,  ...N.  (1.1) 

Here  the  deviations  {6^}  combine  measurement  errors  in  the  response  with 

equation  or  model  error,  and  the  {€^}  are  normally  distributed  with  mean 

2 

zero  and  common  variance  o-g  . 

Rather  than  observing  the  true  working  standards  {x^},  we  observe 

Xi  =  xi  +  vi  <1.2) 

where  the  measurement  errors  { v^  >  are  assured  normally  distributed  with 

2 

mean  zero  and  variance  o-m  .  In  the  terminology  of  Puller  (1986),  the 
equation  (1.1)  includes  both  equation  error  and  response  measurement  error. 
Fran  now  on,  when  we  speak  of  measuranent  error  we  will  mean  measurement 
error  in  the  true  {x^K 


Assuming  the  working  standards  {x^}  are  measured  without  error,  one 
would  often  precede  as  follows.  First,  perform  the  usual  least  squares 
analysis,  which  yields  estimates  (at^,  8^,  o-^) .  A  new,  independent 
observation  Y*  is  then  made,  and  the  goal  is  to  estimate  the  value  of  x* 


such  that 


E  Y*  *  at  +  8  x*  . 


The  maximum  likelihood  estimator  is 
*.  *  <*.  -  «L>/»L-  <1 
For  confidence  intervals,  the  Working-Hotelling  lOO(l-ac)  %  interval 
(Seber  (1977))  for  the  unknown  x*  is 

I  =  (X!  Y*  is  contained  in  the  intertml  cXj^  +  8L  x  +  t^  o-^  R(x)>, 


where  t^  is  the  l-<x/2  percentage  point  of  the  t-distributicn  with  N-2 
degrees  of  freedom,  and 

R2(x)  *  1  +  N_1{1  +  (x-x)2/s2>  , 

2 

where  x,  sx  are  given  by 


X  =  N_1  )  X.  ,  s2  *  N-1')  (X  -  X)2  . 

L-  1  x  1— 


If  the  calibration  is  to  be  repeated,  more  oonjplex  confidence  statements 
are  available  for  those  who  wish  to  use  then,  see  Scheff£  (1973). 

Draper  and  Smith's  criterion  for  the  severity  of  measurenent  error  is 


measurement  error  variance  in  the  (x. } 


Variation  of  the  {x, } 


Scheffd  and  Mandel  propose  that  the  severity  of  measurement  error  depends 
on  the  size  of 

<#<V1)2  *  (1*6 

In  the  next  section  we  discuss  the  criteria  (1.5)-<1.6)  wuth  regard  to 
estimation  and  confidence  intervals  for  x*  given  an  observed  Y*. 


2.  The  Effect  of  Small  Error 


The  working  standards  {x^}  are  fixed  constants,  and  the  criterion 


(1.5)  thus  depends  on  the  sample  working  standards.  For  large  enough 
samples,  we  will  think  of  the  mean  of  the  (x^ )  as  converging  to  ux  and  the 
variance  of  the  {x^>  also  converging,  so  that  (1.5)  can  be  written  as 


s.  -  °>* : 


(2.1 


The  least  squares  estimates  (ca^,  converge  in  probability  to 


(at  +  \  pxfl/(l+\),  B/(l+X))  respectively.  By  centering  appropriately  so 


that  nx  :  0,  we  see  that  the  bias  in  least  squares  essentially  depends  on 


the  size  of  X  in  (2.1).  When  \  is  small,  for  the  purpose  of  estimation, 
the  effect  of  ignoring  measurement  error  in  the  true  (x^)  is  slight. 

There  is  no  standard  method  to  correct  for  measurement  error  when 
estimating  (at,  fl,  o-g,  o-  ).  For  example,  when  there  is  no  replication  in 
the  experiment,  it  is  customary  to  assume  that  the  ratio 

(2.2 

is  known,  see  Kendall  &  Stuart  (1961,  pages  375-387)  or  Fuller  (1986).  Tn 
seme  applications,  8  will  be  known  from  the  physical  set~n>  of  the  problem. 


2  2 

0  *  *>1 


For  the  effect  of  misspecifying  0,  see  Lak shmi na rayanan  &  Gunst  (1984)  and 
Ketellapper  (1983).  Itie  basic  danger  is  in  thinking  that  6  is  larger  than 
it  actually  is.  In  practice,  if  0  is  not  known  one  usually  considers 
replicating  the  responses  and/or  the  predictors  so  as  to  allow  estimation 
of  o-m  and  o-g,  see  Fuller  (1986)  for  a  thorough  discussion. 

Regardless  of  whether  0  is  known  or  replication  is  used,  we  can  make 
the  following  general  qualitative  statement.  When  V  is  small,  not  only 
are  the  least  squares  estimators  nearly  the  same  as  the  maximum  likelihood 
estimators,  but  in  particular  the  least  squares  estimators  are 
approximately  unbiased  as  discussed  previously.  Ihe  story  is  considerably 
different  when  we  turn  to  confidence  intervals.  Define 

=  length  of  the  confidence  interval  for  x*  given  Y*  taking 
into  account  the  measurement  error  in  {x^>. 

L2  =  length  of  the  confidence  interval  for  x*  ignoring  the 
measurement  error  in  the  (x^>. 

If  we  assume  that  the  sample  sizes  are  large  enough  and,  if  replication  is 
used,  there  are  sufficient  degrees  of  freedom  in  the  replication,  in 
Appendix  A  we  verify  that  when  \  is  small  the  ratio  of  the  confidence 
interval  lengths  is  approximately 


(1  +  (- 


rm  ,2 A 


o-e/6 


rr. 


The  reason  that  (2.3)  holds  is  that,  as  seen  in  (1.4),  the  length  Lj  of 

confidence  interval  ignoring  measurement  error  is  essentially  proportional 

~  2  2  2  k 

to  o-T ,  which  converges  in  probabilty  to  (©■_  +  S  o-  )  ,  while  the  length  L. 


is  proportional  to  an  estimte  of  o-gj  the  ratio  of  these  two  lengths  is 
(2.3). 

Equation  (2.3)  verifies  the  criterion  of  Scheffe  and  Mandel  that  for 

confidence  intervals,  we  can  ignore  measurement  error  in  the  world ng 

2 

standards  only  if  the  measurement  error  has  variance  o-  snail  relative  to 

m 

2  2 

o-g/fl  .  In  the  next  section  we  provide  an  example  where  the  criterion 
(1.5)  mentioned  by  Draper  &  Smith  is  small  but  the  Scheffe  and  Mandel 
criterion  (1.6)  is  large. 

3.  fin  Exanple 

In  Table  1  we  list  a  subset  of  the  data  investigated  by  Lechner,  Reeve 
&  Spiegelman  (1982).  It  is  not  our  purpose  to  provide  a  definitive 
analysis  of  these  data.  Rather,  we  use  the  data  only  to  provide  a  means  of 
exploring  the  effect  of  ignoring  .small  measurement  error,  especially 
through  the  increased  length  ratio  (2.3).  We  assume  a  straight  line  fit 
(1.1)  to  the  data.  We  find  that  =  -291.49,  0L  =  2346.64  and  <>L  =  1.64. 
From  discussion  with  the  investigators  it  was  thought  that  o-m  and  o-g  are 
of  the  same  order  of  magnitude.  However,  since  o-g  is  made  up  of  both 
response  measurement  error  and  eq  u'  «i  error,  for  this  illustration  we 
decided  to  be  rather  conservative  as  suggested  by  Lakshminarayanan  &  Gunst 
(1984)  and  Ketellapper  (1983)  and  set  0  =  0.001  in  (2.2).  Following 
Kendall  &  Stuart  (1961),  the  maximum  likelihood  estimators  of  (<x,0,o) 
assuming  0  is  kncwn  are  given  by 


e  s“)  +  {(s‘  -  e  s^) 


°m  6  lsx  "  SYX'fl*1, 


where 


4-5  )_  ,xrx)2 

i=l 

l  JL 

Sy  =  -  )  <Y.-Y)Z 

N  A-.  1 

1*1 


Syx  =  ~;  (XrX)(Y.-Y). 

N  i~l 


It  is  known  that  the  maximun  likelihood  estimator  for  o-  is  biased  even  in 
larger  saiqples,  and  it  is  customary  to  make  the  correction 

°m*  =  2oV 
We  found  that 

fi*  =  2346.64,  O'*  =6.77  X  10-4. 
m* 

Making  the  rough  approximations 

~  '2  2 
fl  -  fl*»  o-  -  o-*,  O'  I  .  OOlo-^  and 
m  mw  m  6 


2  “2 

ov  -  Sarrple  variance  of  observed  X's  -  o  Z  0.57, 
x  m 

A  A 

we  find  that  \  <  0.001.  Since  \  is  very  small  and  a^  I  8*,  we  conclude 
that  for  purposes  of  estimation,  measurement  error  in  the  {x^}  can  be 


,**y»*.  .  *»v 

.  \  \  \  <•  •  ,  *  *  *  «  ’  V*  .  * .  * 


-•A'.v.o.v.s* 


effectively  ignored.  However,  the  ratio  of  the  lengths  of  the  confidence 
intervals  for  x*  is  approximately 

l2ax  :  a  +  e  82)*  :  74.2  . 

This  large  ratio  emphasizes  our  point  that  the  definition  of  "snail 
measurement  error"  must  depend  on  whether  one  is  interested  in  estimation 
or  confidence  intervals. 

4.  Conclusion 

We  have  shown  that,  under  the  ideal  conditions  of  a  straight  line 
model  and  a  fairly  large-sized  working  sample,  ignoring  measurement  errors 
in  x  which  are  "small"  relative  to  the  usual  estimation  criterion  (2.1)  can 
result  in  calibration  confidence  intervals  which  are  much  larger  than 
necessary.  For  confidence  intervals,  it  is  more  sensible  to  judge 
measurement  error  size  on  the  basis  of  both  (1.5)  and  (2.3).  Ignoring  the 
measurement  error  in  the  true  working  standards  (x^ }  will  cause  an  increase 
in  confidence  interval  length  on  the  order  of  (2.3). 

We  finish  by  emphasizing  that  using  measurement  error  techniques  ho 
obtain  shorter  calibration  confidence  intervals  requires  that  equation 
(1.1)  should  hold.  While  least  square  confidence  intervals  can  be  very 
conservative  in  exanples  such  as  we  have  studied,  they  are  more  robust 
against  snail  model  misspecifications.  small  perturbations  from  the 
straight-line  fit  can  significantly  alter  the  coverage  probabilities  of  the 
measurement  error  confidence  interval  1^  without  greatly  affecting  the 
coverage  of  the  least  squares  intervals. 
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Appendix  A 

In  this  appendix,  we  verify  the  approximation  (2.3).  While  a  precise 
large-sample  analysis  is  routine,  it  is  also  notationally  quite  cumber sane. 
The  essential  ideas  are  perhaps  easier  to  understand  through  the  following 
heuristic  analysis.  Suppose  that  N  is  large  and  that  X  in  (2.1)  is  small. 
Assuming  that 

(A.  1)  =  6  lcnown» 

then  maximum  likelihood  estinates  (at*,  a*)  can  be  formed  which  are 
consistent  for  (a, a),  see  Fuller  (1986).  Under  the  assunption  of  small  X 
and  large  sample  size  N,  we  have 

:  a*  :  <x;  aL  ;  a*  :  a  ; 

R(x)  ~  1}  o-  *  1  oT ,  oT  I  (o-?  +  a2  . 

nr  L  L  fe  m 

Here  o-m*  is  the  usual  consistent  estimate  of  o-g  under  the  assunption 
(2.2).  Taking  into  account  the  measurement  error  in  { x^ }  and  using  (at*, 

8*,  o-m*),  within  our  heuristic  framework  the  appropriate  Working-Hotelling 
confidence  interval  for  x*  is  approximately 

J1  =  <*!  Y*  e  +  ».  1  20C  'V1  - 
where  is  the  l-ac/2  standard  normal  percentage  point.  The  usual 

interval  formed  by  ignoring  measurement  error  is  approximately 
12  “  {x:  Y*  6  +  SL  x  -  z<x  °L}  * 

This  latter  interval  is  strictly  appropriate  not  for  x*  but  rather  for 
X*  =  x*  +  v  .  The  length  of  the  confidence  interval  1^  taking  into  account 
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measurement  error  in  {x^}  is,  for  large  sainples,  proportional  to 
(A.  2)  1^  :  2  o-g/S 

while  that  for  the  usual  least  squares  analysis  is  proportional  to 
(A.31  4  :  2  +  fl2  . 

The  ratio  of  these  lengths  is,  noting  (A.l), 

(A.4)  p  :  (l  ♦  (V'^e/sn2)4  . 


